智能论文笔记

A Proposal for Amending Privacy Regulations to Tackle the Challenges Stemming from Combining Data Sets

Gábor Erdélyi , Olivia J. Erdélyi , Andreas W. Kempa-Liehr

分类：人工智能

2021-11-26

现代信息和通信技术实践提出了对隐私的新威胁。我们专注于当前数据保护规范的一些缺点，可以充分解决AI驱动数据处理实践的后果，特别是组合数据集的影响。我们建议隐私监管依赖于个人的隐私预期，并建议在两个方向上建议监管改革：（1）为了引发数据保护法的应用和（2）制定方法优先考虑个人和匿名数据之间的区分基于各个数据处理行动提出的隐私风险水平的监管干预。这是一个跨学科论文，打算在涉及涉及隐私研究的各个社区之间建立一座桥梁。我们特别强调将技术概念与其监管影响联系起来，并介绍了相关的技术和法律术语，以促进政策制定和技术社区之间更有效的协调，并及时解决提出的问题。

translated by 谷歌翻译

Randomized Classifiers vs Human Decision-Makers: Trustworthy AI May Have to Act Randomly and Society Seems to Accept This

Gábor Erdélyi , Olivia J. Erdélyi , Vladimir Estivill-Castro

分类：人工智能 | 机器学习

2021-11-15

\ EMPH {人工智能}（AI）系统越来越多地参与影响我们生活的决策，确保自动决策是公平的，道德已经成为最优先事项。直观地，我们觉得类似人的决定，人工代理人的判断应该必然地以一些道德原则为基础。然而，如果有关决定所基础的所有有关因素的全部信息，可以真正伦理（人类或人为）和公平（根据任何道德理论）和公平（根据公平的任何概念）的规定在决策时。这提出了两个问题：（1）在设置中，我们依赖使用通过监督学习获得的分类器的AI系统，存在一些感应/泛化，即使在学习期间也可能不存在一些相关属性。（2）根据游戏揭示任何 - 无论是道德的纯策略都不可避免地易于剥削，建模这些决定。此外，在许多游戏中，只能通过使用混合策略来获得纳什均衡，即实现数学上最佳结果，决定必须随机化。在本文中，我们认为，在监督学习设置中，存在至少以及确定性分类器的随机分类器，因此在许多情况下可能是最佳选择。我们支持我们的理论效果，具有一个实证研究，表明对随机人工决策者的积极社会态度，并讨论了与使用与当前的AI政策和标准化举措相关的随机分类器相关的一些政策和实施问题。

translated by 谷歌翻译

Cell-Free Data Power Control Via Scalable Multi-Objective Bayesian Optimisation

Sergey S. Tambovskiy , Gábor Fodor , Hugo Tullberg

分类：机器学习 | (统计)机器学习

2022-12-20

Cell-free multi-user multiple input multiple output networks are a promising alternative to classical cellular architectures, since they have the potential to provide uniform service quality and high resource utilisation over the entire coverage area of the network. To realise this potential, previous works have developed radio resource management mechanisms using various optimisation engines. In this work, we consider the problem of overall ergodic spectral efficiency maximisation in the context of uplink-downlink data power control in cell-free networks. To solve this problem in large networks, and to address convergence-time limitations, we apply scalable multi-objective Bayesian optimisation. Furthermore, we discuss how an intersection of multi-fidelity emulation and Bayesian optimisation can improve radio resource management in cell-free networks.

translated by 谷歌翻译

Hyperactive Learning (HAL) for Data-Driven Interatomic Potentials

Cas van der Oord , Matthias Sachs , Dávid Péter Kovács , Christoph Ortner , Gábor Csányi

分类： (统计)机器学习

2022-10-09

Data-driven interatomic potentials have emerged as a powerful class of surrogate models for {\it ab initio} potential energy surfaces that are able to reliably predict macroscopic properties with experimental accuracy. In generating accurate and transferable potentials the most time-consuming and arguably most important task is generating the training set, which still requires significant expert user input. To accelerate this process, this work presents \text{\it hyperactive learning} (HAL), a framework for formulating an accelerated sampling algorithm specifically for the task of training database generation. The key idea is to start from a physically motivated sampler (e.g., molecular dynamics) and add a biasing term that drives the system towards high uncertainty and thus to unseen training configurations. Building on this framework, general protocols for building training databases for alloys and polymers leveraging the HAL framework will be presented. For alloys, ACE potentials for AlSi10 are created by fitting to a minimal HAL-generated database containing 88 configurations (32 atoms each) with fast evaluation times of <100 microsecond/atom/cpu-core. These potentials are demonstrated to predict the melting temperature with excellent accuracy. For polymers, a HAL database is built using ACE, able to determine the density of a long polyethylene glycol (PEG) polymer formed of 200 monomer units with experimental accuracy by only fitting to small isolated PEG polymers with sizes ranging from 2 to 32.

translated by 谷歌翻译

Tensor-reduced atomic density representations

James P. Darby , Dávid P. Kovács , Ilyes Batatia , Miguel A. Caro , Gus L. W. Hart , Christoph Ortner , Gábor Csányi

分类：机器学习

2022-10-02

Density based representations of atomic environments that are invariant under Euclidean symmetries have become a widely used tool in the machine learning of interatomic potentials, broader data-driven atomistic modelling and the visualisation and analysis of materials datasets.The standard mechanism used to incorporate chemical element information is to create separate densities for each element and form tensor products between them. This leads to a steep scaling in the size of the representation as the number of elements increases. Graph neural networks, which do not explicitly use density representations, escape this scaling by mapping the chemical element information into a fixed dimensional space in a learnable way. We recast this approach as tensor factorisation by exploiting the tensor structure of standard neighbour density based descriptors. In doing so, we form compact tensor-reduced representations whose size does not depend on the number of chemical elements, but remain systematically convergeable and are therefore applicable to a wide range of data analysis and regression tasks.

translated by 谷歌翻译

Two-Tailed Averaging: Anytime Adaptive Once-in-a-while Optimal Iterate Averaging for Stochastic Optimization

Gábor Melis

分类： (统计)机器学习 | 自然语言处理 | 机器学习

2022-09-26

通过从其计算中排除了许多随机优化的领先迭代，尾巴平均对Polyak平均的非反应行为进行了改善。实际上，具有有限数量的优化步骤和无法将其退火至零的学习率，尾巴平均可以比单个迭代或polyak平均值更接近训练损失的局部最小点。但是，引导迭代的忽略数量是重要的超参数，并且开始平均太早或太晚导致资源或次优溶液的使用效率低下。将此超参数设置为改善概括更加困难，尤其是在其他超参数和过度拟合的情况下。此外，在平均开始之前，损失只是对最终表现的淡淡信息，这使得早期停止不可靠。为了减轻这些问题，我们提出了任何时间平均变体，该变体没有超参数，并且在所有优化步骤中都近似最佳的尾巴。我们的算法基于两个运行平均值，其自适应长度以最佳的尾巴长度为界，其中一种具有一些规律性的近似最佳性。仅需要两组重量的额外存储空间和对损失的定期评估，提出的两尾平均算法是一种实用且广泛适用的方法，可用于改善随机优化。

translated by 谷歌翻译

Object Detection Using Sim2Real Domain Randomization for Robotic Applications

Dániel Horváth , Gábor Erdős , Zoltán Istenes , Tomáš Horváth , Sándor Földi

分类：机器人 | 计算机视觉

2022-08-08

在非结构化环境中工作的机器人必须能够感知和解释其周围环境。机器人技术领域基于深度学习模型的主要障碍之一是缺乏针对不同工业应用的特定领域标记数据。在本文中，我们提出了一种基于域随机化的SIM2REAL传输学习方法，用于对象检测，可以自动生成任意大小和对象类型的标记的合成数据集。随后，对最先进的卷积神经网络Yolov4进行了训练，以检测不同类型的工业对象。通过提出的域随机化方法，我们可以在零射击和单次转移的情况下分别缩小现实差距，分别达到86.32％和97.38％的MAP50分数，其中包含190个真实图像。在GEFORCE RTX 2080 TI GPU上，数据生成过程的每图像少于0.5 s，培训持续约12H，这使其方便地用于工业使用。我们的解决方案符合工业需求，因为它可以通过仅使用1个真实图像进行培训来可靠地区分相似的对象类别。据我们所知，这是迄今为止满足这些约束的唯一工作。

translated by 谷歌翻译

Archaeology of random recursive dags and Cooper-Frieze random networks

Simon Briend , Francisco Calvillo , Gábor Lugosi

分类：机器学习 | (统计)机器学习

2022-07-29

我们研究在大型增长网络中找到根顶点的问题。我们证明，可以构建大小的置信集，而不是网络中包含root顶点的顶点的数量，在各种随机网络的各种模型中都具有很高的概率。这些模型包括均匀的随机递归dag和统一的库珀 - 弗里兹随机图。

translated by 谷歌翻译

MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields

Ilyes Batatia , Dávid Péter Kovács , Gregor N. C. Simm , Christoph Ortner , Gábor Csányi

分类： (统计)机器学习 | 机器学习

2022-06-15

在计算化学和材料科学中，创建快速准确的力场是一项长期挑战。最近，已经证明，几个直径传递神经网络（MPNN）超过了使用其他方法在准确性方面构建的模型。但是，大多数MPNN的计算成本高和可伸缩性差。我们建议出现这些局限性，因为MPNN仅传递两体消息，从而导致层数与网络的表达性之间的直接关系。在这项工作中，我们介绍了MACE，这是一种使用更高的车身订单消息的新型MPNN模型。特别是，我们表明，使用四体消息将所需的消息传递迭代数减少到\ emph {两}，从而导致快速且高度可行的模型，达到或超过RMD17的最新准确性，3BPA和ACAC基准任务。我们还证明，使用高阶消息会导致学习曲线的陡峭程度改善。

translated by 谷歌翻译

The SIGMORPHON 2022 Shared Task on Morpheme Segmentation

Khuyagbaatar Batsuren , Gábor Bella , Aryaman Arora , Viktor Martinović , Kyle Gorman , Zdeněk Žabokrtský , Amarsanaa Ganbold , Šárka Dohnalová , Magda Ševčíková , Kateřina Pelegrinová

分类：自然语言处理

2022-06-15

Sigmorphon 2022关于词素分割的共享任务挑战了将单词分解为一系列词素的系统，并涵盖了大多数类型的形态：化合物，衍生和弯曲。子任务1，单词级词素细分，涵盖了9种语言的500万个单词（捷克，英语，西班牙语，匈牙利语，法语，意大利语，俄语，拉丁语，蒙古语），并收到了7个团队的13个系统提交，最佳系统平均为97.29％F1在所有语言中得分，英语（93.84％）到拉丁语（99.38％）。子任务2，句子级的词素细分，涵盖了3种语言的18,735个句子（捷克，英语，蒙古人），从3个团队中收到10个系统提交，最好的系统优于所有三种最先进的子字体化方法（BPE（BPE），Ulm，Morfessor2）绝对30.71％。为了促进错误分析并支持任何类型的未来研究，我们发布了所有系统预测，评估脚本和所有黄金标准数据集。

translated by 谷歌翻译